A Pipeline Japanese Entity Linking System with Embedding Features
نویسنده
چکیده
Entity linking (EL) is the task of connecting mentions in texts to entities in a large-scale knowledge base such as Wikipedia. In this paper, we present a pipeline system for Japanese EL which consists of two standard components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese EL corpus. For candidate generation, we find that a concept dictionary using anchor texts of Wikipedia is more effective than methods based on surface similarity. For candidate ranking, we verify that a set of features used in English EL is effective in Japanese EL as well. In addition, by using a corpus that links Japanese mentions to Japanese Wikipedia entries, we are able to get rich context information from Japanese Wikipedia articles and benefit mention disambiguation. It was not directly possible with previous EL corpora, which associate mentions to English Wikipedia entities. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities, and show that they improve candidate ranking. As a whole, our system achieves 82.27% accuracy, significantly outperforming previous work.
منابع مشابه
سیستم شناسایی و طبقهبندی موجودیتهای اسمی در متون زبان فارسی بر پایه شبکه عصبی
Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...
متن کاملCUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description
The CUNY-BLENDER team participated in the following tasks in TAC-KBP2010: Regular Entity Linking, Regular Slot Filling and Surprise Slot Filling task (per:disease slot). In the TAC-KBP program, the entity linking task is considered as independent from or a pre-processing step of the slot filling task. Previous efforts on this task mainly focus on utilizing the entity surface information and the...
متن کاملToward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities
We present a novel neural network model for entity linking that exploits distributed representations of users, mentions, and entities. • Our system leverages social network structures by utilizing entity homophily to improve entity disambiguation. • Our neural network model is on par with the tree-based model (Yang and Chang 2015) with surface features, but it is much easier to add additional i...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملDomino: SAIC's English Entity-Linking System
The Domino system was SAIC’s student-intern entry to the English Entity-Linking track of the 2012 TAC-KBP competition. This paper describes how Domino was developed using components from the CUNY-BLENDER system and discusses the features and rules that were added to Domino. It analyzes Domino’s performance, and suggests ways in which we plan to improve the system in the future. 1.Building the D...
متن کامل